Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 1146420220140010041
Korean Medical Association of Clinical Sanghan-Geumgwe
2022 Volume.14 No. 1 p.41 ~ p.50
Selecting Machine Learning Model Based on Natural Language Processing for Shanghanlun Diagnostic System Classification
Kim Young-Nam

Abstract
Objectives : The purpose of this study is exploring the most suitable machine learning model algorithm for Shanghanlun diagnostic system classification by using natural language processing (NLP).

Methods : 201 data were collected from ¡ºShanghanlun¡» and ¡ºClinical Shanghanlun¡», ¡®Taeyangbyeong-gyeolhyung¡¯ and ¡®Eumyangyeokchahunobokbyeong¡¯ were excluded for preventing oversampling and undersampling. Data were pretreated by using twitter korean tokenizer, and trained by logistic regression, ridge regression, lasso regression, naive bayes classifier, decision tree, and random forest algorithms. Accuracy was used for evaluating each model.

Results : As a result of machine learning, ridge regression and naive bayes classifier show 0.843 accuracy, logistic regression and random forest show 0.804 accuracy, and decision tree shows 0.745 accuracy, lasso regression shows 0.608 accuracy.

Conclusions : Ridge regression and naive bayes classifier are suitable NLP machine learning model for Shanghanlun diagnostic system classification.
KEYWORD
Artificial intelligence, Machine learning, Natural Language Processing, Shanghanlun, Diagnostic system
FullTexts / Linksout information
Listed journal information
ÇмúÁøÈïÀç´Ü(KCI)